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Abstract: Preclinical researchers 
confront two overarching agendas 
related to drug development: se- 
lecting interventions amid a vast 
field of candidates, and producing 
rigorous evidence of clinical prom- 
ise for a small number of interven- 
tions. We suggest that each chal- 
lenge is best met by two different, 
complementary modes of investi- 
gation. In the first (exploratory 
investigation), researchers should 
aim at generating robust patho- 
physiological theories of disease. In 
the second (confirmatory investiga- 
tion), researchers should aim at 
demonstrating strong and repro- 
ducible treatment effects in rele- 
vant animal models. Each mode 
entails different study designs, con- 
fronts different validity threats, and 
supports different kinds of infer- 
ences. Research policies should 
seek to disentangle the two modes 
and leverage their complementari- 
ty. In particular, policies should 
discourage the common use of 
exploratory studies to support con- 
firmatory inferences, promote a 
greater volume of confirmatory 
investigation, and customize de- 
sign and reporting guidelines for 
each mode. 



Introduction 

The past few years have witnessed 
growing consternation over the way re- 
searchers perform and report preclinical 
investigations of new drugs. The vast 
majority of drugs advanced into trials 
never recapitulate safety and efficacy 
observed in animal models, and these 
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failures exact a heavy toll on trial volun- 
teers, the research enterprise, and health 
care systems via higher drug prices. 
Because many preclinical studies poorly 
address internal validity threats [1], fail 
attempts at replication [2], are not 
published [3], or provide exaggerated 
estimates of clinical utility, numerous 
stakeholders are urging reforms in the 
way preclinical research is performed [4] . 

We would like to offer a cautionary 
perspective on these initiatives. We suggest 
that the ostensibly poor performance of 
many preclinical studies may in fact reflect 
strengths and intrinsic properties of what 
we call "exploratory investigation" — 
roughly, studies aimed at generating 
robust pathophysiological theories of 
disease. Policies aimed at improving trans- 
lation should strive to preserve the ex- 
traordinary power of exploratory studies, 
which represent the majority of preclinical 
studies [5], while promoting a separate 
mode of clinical trial-like preclinical re- 
search, which we call "confirmatory" 
studies — that is, studies aimed at demon- 
strating strong and reproducible treatment 
effects in relevant animal models. We close 
by describing some ways of capitalizing on 
the complementarity of the two modes. 

Exploratory Versus 
Confirmatory Research 

Clinical translation of novel interven- 
tional strategies confronts two overarching 



challenges. First, researchers must negoti- 
ate a virtually unbounded landscape of 
potential targets, drugs, doses, and treat- 
ment regimens. A key task is to develop 
the theories, measurement techniques, and 
evidence for selecting a manageable num- 
ber of interventions to carry forward. 
Second, clinical development is enormous- 
ly expensive and exposes patients to 
unproven and possibly toxic interventions. 
Another key task of preclinical research is 
thus to produce evidence that is sufficiently 
compelling to warrant the economic and 
moral costs of clinical development. 

Overcoming these two challenges ne- 
cessitates different modes of investigation. 
The first set of challenges is best met by 
studies that operate in the exploratory mode. 
We use "exploratory" to capture some- 
thing broader than what is generally 
meant in statistics. In our conception, 
exploratory studies will aim primarily at 
developing pathophysiological theories 
that enable pursuit of different approach- 
es. Exploratory studies tend to consist of a 
package of small and flexible experiments 
using different methodologies, including 
molecular and cellular analyses. These 
individual experiments may or may not 
employ inferential statistics. Exploratory 
studies are often driven by a series of 
hypotheses that are either loosely articu- 
lated or that evolve over the course of 
sequential experiments. Often, exploratory 
studies include tests of an intervention's 
efficacy against disease in live animals as a 
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way of validating the pathophysiological 
theories ("efficacy studies"). Neither the 
sequence of individual experiments in 
exploratory studies, nor details of their 
design (including sample size, since effect 
sizes may be unknown), is necessarily 
established at the outset of investigation. 

The second set of challenges is best 
overcome by studies that operate in a 
confirmatory mode. Such studies will resem- 
ble adequately powered clinical trials, and 
consist mainly of "efficacy studies" that 
use rigid and pre-specified designs, a priori 
stated hypotheses, prolonged durations, 
and the most clinically relevant assays 
and endpoints available. These studies aim 
less at elaborating theories or mechanisms 
of a drug's action than rigorously testing a 
drug's clinical potential and restricting the 
advance of ineffective interventions ad- 
vanced into clinical testing. Exploratory 
studies are a complement to confirmatory 
studies in that the former generates 
precisely articulated hypotheses about 
drug effects that can be put to "crucial 
testing" in the latter before clinical devel- 
opment. 

Currently, the vast majority of preclin- 
ical studies more closely resemble explor- 
atory studies, although a small but 
growing number of studies operate in a 
confirmatory mode. These different orien- 
tations carry important imperatives for the 
design, reporting, error tendencies, and 
application of preclinical studies. What 
may be an inferential strength for explor- 
atory study can be a hindrance or even a 
fatal flaw for confirmatory studies and vice 
versa. Policies and practices aimed at 
improving clinical translation should rec- 
ognize at least four major contrasts 
between the two modes of investigation. 

Implications for Design and 
Valid Interpretation 

The first difference has already been 
noted: whereas exploratory studies should 
mainly aim at deriving or testing theoret- 
ical claims, confirmatory studies should 
test clinical utility of new interventions. 
Since theories are not directly observable, 
they are tested by assembling corrobora- 
tory evidence across different lines of 
experimentation. This theoretical orienta- 
tion in preclinical research is reflected in 
the fact that a good part of the acreage in 
publications is devoted to molecular or 
cellular analyses (e.g., gene expression, 
immunohistochemistry, electrophysiolo- 
gy), not efficacy studies. Spreading proof 
across different lines of experiment — a 
process called "conceptual replica- 
tion" [6] — has several consequences for 



predictive value. On the one hand, threats 
to the validity of theoretical claims driving 
a preclinical study are mitigated — though 
not eliminated — by conceptual replica- 
tions. On the other hand, therapeutic 
claims arising from efficacy studies con- 
tained in the exploratory package will be 
prone to larger random and systematic 
variation: such studies invest less in any 
single experiment, and therefore employ 
smaller sample sizes and less fastidious 
designs. In contrast, because confirmatory 
studies "bet the house" on a single, pivotal 
efficacy study and measurement tech- 
nique, there is more at stake scientifically 
in minimizing random and systematic 
error. 

Second, whereas exploratory studies 
should place a premium on sensitivity 
(i.e., detecting all strategies that might be 
useful), confirmatory studies should be 
more concerned with specificity (i.e., 
excluding all strategies that will prove 
useless in clinical trials). This is because 
the task of exploration is to catch a small 
number of promising theories, targets, 
compounds, doses, or variants of a target 
indication against a large field. However, 
in many areas of drug development, the 
prior probability of discovering useful 
strategies is extraordinarily low. This 
means that even in the ideal, where 
exploratory studies have very high sensi- 
tivity and specificity, most candidates that 
are declared promising will represent false 
positives. Since there are large financial 
and human costs for advancing these false 
positives into trials, the task of confirma- 
tion is to eliminate "false positives" that 
are captured in exploration. Further, the 
agonizingly low positive predictive value of 
exploratory studies may have as much to 
do with base rates as it does with bias. 

Third, use of small sample sizes for 
efficacy experiments contained in explor- 
atory studies may lead to large random 
variation that produces the appearance of 
bias even in its absence. This dynamic, 
known as the "winner's curse" [7], reflects 
the fact that research in the exploratory 
mode will often test many different 
strategies in parallel, and this is only 
feasible if small sample sizes are used. As 
a consequence of random variation alone, 
some experiments will produce larger 
effects that regress to the mean if replicat- 
ed. In contrast, confirmatory studies 
should employ sufficiently large sample 
sizes as to minimize the effect of random 
variation, such that dwindling effect sizes 
on replication may be symptomatic of 
publication bias rather than natural re- 
gression. 



Last, exploratory studies often involve 
testing interventions alongside techniques 
used to measure their effects. In contrast, 
methods should be well established when 
an intervention is tested in confirmatory 
studies. Assays for testing pathophysiolog- 
ical responses, or the probative value of 
biomarkers, or skills for performing a 
behavioral test may be still in development 
at the point of exploratory investigation. 
One example of this is uncertainty sur- 
rounding techniques for testing drugs that 
target cancer stem cells. Here, standard 
assays for testing the clinical promise of 
cancer drugs are almost useless, yet there 
is little consensus about which assays to use 
instead [8]. Another example might be 
where a graduate student conducts exper- 
iments before having mastered the requi- 
site manual skills. As a consequence of 
uncertainty surrounding measurement, 
exploratory researchers encounter difficul- 
ty discriminating informative and uninfor- 
mative findings: "positive" findings may 
be attributable to assay artefacts; "nega- 
tive" findings may reflect defects in the 
measurement tools, choice of the wrong 
treatment regimen, or suboptimal experi- 
menter skill. Since the value of uninfor- 
mative findings for the broader research 
community is limited, the absence of firm 
rules for discrimination legitimately con- 
founds decisions about what findings to 
publish and how to interpret them. Any 
blanket proscription against "hiding" data 
risks obscuring truly interesting findings 
amid a large volume of studies that the 
experimenter knows to be uninformative 
to the broader research community: 
"practice runs," experiments on miscali- 
brated instruments, or findings using 
methods that are later discovered to be 
error prone. On the other hand, where 
researchers have grounds for confidence in 
the regimens for testing, nonpublication of 
negative findings represents a demonstra- 
ble breach of scientific integrity. This will 
tend to be a much greater concern in 
confirmatory testing, since measurement 
techniques tend to be more established in 
that setting. 

In sum, there are many factors that 
explain why preclinical studies are prone 
to producing "false positives" or outcome 
patterns that give the appearance of bias. 
Yet to some degree, these reflect strengths 
of exploratory research, such as its ability 
to narrow the field of intervention candi- 
dates using an economy of resources, to 
select among myriad pathophysiological 
theories, and to hone techniques of 
measuring clinical promise. These are 
necessary precursors to the sorts of 
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rigorous confirmatory experiments that 
should be used to justify clinical develop- 
ment. 

Improving Design and 
Interpretation of Preclinical 
Research 

Though some of the above contrasts 
may appear obvious to anyone with a 
basic understanding of statistics and ex- 
perimental design, they are not adequately 
reflected in many reforms urged by critics 
of preclinical research — e.g., calls for using 
larger sample sizes, "gold standard" ani- 
mal models, or independent replication 
[9,10]. Some proposals entail non-trivial 
burdens such as restructuring laboratory 
practices, writing up and/or depositing 
inconclusive findings, or using larger 
sample sizes, and hence undermine the 
economy of exploratory activities. Re- 
forms are more likely to have a transfor- 
mative impact on drug development if 
researchers can capitalize on the comple- 
mentary properties of both exploratory and 
translational studies, and tailor study 
design, reporting, and application of 
findings accordingly. To that end, we offer 
three sets of recommendations. 

First, all protocols and publications 
should pre-specify whether they are "ex- 
ploratory" or "confirmatory" studies, with 
the latter category reserved for studies that 
aim at demonstrating promise of clinical 
utility for an intervention. We note that 
other commentators have made similar 
calls [1 1,12]. Journal editors and funding 
agencies should promote this demarcation 
by requiring it for submitted manuscripts 
and grants. Standards for review should 
then hinge on the way investigators classify 
studies. For instance, confirmatory studies 
should be held to internal and construct 
validity standards similar to those used in 
clinical trials: studies should address con- 
founders like sample or observation bias, 
use pre-specified statistical analyses, match 
the experimental design to the conditions 
where findings are expected to be applied, 
and report findings in ways that enable 
meaningful interpretation by non-experts. 
Large sample sizes, fastidious experimental 
conditions, and conservative statistical 
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